4  Reproducible Research

Setting up simulation studies that everyone can replicate and reproduce

Lecture 4

This week’s slides can be found here


4.1 Introduction

Dear all,

This week, it’s all coming together: you will apply your newly acquired markup language skills and build a reproducible research archive from scratch. We will assume that, by now, you’re familiar with Git and GitHub. If that’s not the case, please refer to last weeks’ course materials.

The contents of the research compendium will consist of the simulation study performed by Boulesteix, Groenwold, Abrahamowicz, et al. (2020). In this week’s exercises, you will make the simulation study easier to reproduce. By the end of the exercises you’ve created an improved version of how this simulation may be represented in the online supplement.

All the best,

Hanne and Gerko


4.2 Preparation

4.2.1 Required reading

Read Boulesteix, Groenwold, Abrahamowicz, et al. (2020).

Most of the paper may be skimmed through, but make sure to closely read the penultimate section named “An example of a statistical simulation”, since this will be this week’s case study.

4.2.2 Optional reading

The interested reader may also consider reading the original Morris, White, & Crowther (2019) paper, that Boulesteix et al. (2020) is based on. This shouldn’t be too hard, because the target audience of ‘Using simulation studies to evaluate statistical methods’ overlaps significantly with graduate students in statistics. And moreover, you might find it extremely useful for your Master’s thesis!


4.3 Exercise 1

In this exercise you will follow the Utrecht University tutorial on creating reproducible manuscripts: https://utrechtuniversity.github.io/workshop-reproducible-manuscripts/introduction.html. The link intentionally refers to the section named ‘Introduction’; previous sections are not relevant for you. Follow the instructions in the sections ‘Introduction’, ‘Getting Started’, ‘Writing’, and ‘Code’.

In the section ‘Rendering’, you may skip the instructions for creating docx documents, using the terminal, and managing multiple file outputs. We will only need the pdf output.

4.3.1 Hint

After this exercise, the YAML header for your Quarto document might look something like the code in the footnote1.


4.4 Exercise 2

In this exercise, you will substitute the contents of the research compendium you’ve just created with Boulesteix et al.’s simulation study.

The original simulation script was supplemented as a pdf file. Lucky for you, we’ve already copied all of the R code from their pdf supplement for you. The resulting R script is available from https://github.com/gerkovink/markup/blob/main/documents/week-4/original_script.R.

Likewise, we’ve retrieved the necessary data for you. The complete database can be accessed through https://wwwn.cdc.gov/nchs/nhanes/continuousnhanes/default.aspx?Begi-nYear=2015, whereas the relevant data files for this exercise can be downloaded directly from https://github.com/gerkovink/markup/blob/main/documents/week-4/raw_data.

Now it’s up to you! Create a reproducible document that contains the simulation study and results. Make sure that this document is part of a fully reproducible research compendium.

Submit the research compendium to this year’s Markup Languages repository with a pull request on GitHub.

4.4.1 Hint

Exchange research compendiums with a classmate to make sure that your work is indeed completely reproducible.


  1. ---
    title: "My Reproducible Manuscript"
    format: 
    pdf: default
    bibliography: references.bib
    ---
    ↩︎